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Abstract 

The statistical mechanical approach to complex networks is the dominant paradigm in describing natural 
and societal complex systems The study of network properties, and their implications on dynamical pro- 
cesses, mostly focus on locally defined quantities of nodes and edges, such as node degrees, edge weights 
and -more recently- correlations between neighboring nodes. However, statistical methods quickly be- 
come cumbersome when dealing with many-body properties and do not capture the precise mesoscopic 
structure of complex networks. 

Here we introduce a novel method, based on persistent homology, to detect particular non-local struc- 
tures, akin to weighted holes within the link- weight network fabric, which are invisible to existing methods. 
Their properties divide weighted networks in two broad classes: one is characterized by small hierarchi- 
cally nested holes, while the second displays larger and longer living inhomogeneities. These classes 
cannot be reduced to known local or quasilocal network properties, because of the intrinsic non-locality 
of homological properties, and thus yield a new classification built on high order coordination patterns. 
Our results show that topology can provide novel insights relevant for many-body interactions in social 
and spatial networks. Moreover, this new method creates the first bridge between network theory and 
algebraic topology, which will allow to import the toolset of algebraic methods to complex systems. 

1 Introduction 

Complex networks have become one of the prominent tools in the study of social, technological and 
biological systems [1, 2, 3]. In particular, weighted networks have been largely used to convey not only 
the presence but also the intensity of relations between nodes in a network. Real-world networks display 
however intricate patterns of redundant links with edge weights and node degrees usually ranging over 
various orders of magnitudes [4, 5]. This makes very hard to extract the significant network structure from 
the background [6, 7, 8, 9], especially in the case of very dense networks [10, 11]. Alongside topological 
filtering methods [13, 12], the typical approach to this problem is to choose a suitable threshold for the 
edge weights, e.g. global [10] or local [14], and study the reduced graph composed by only the edges of 
weight larger (smaller) than the threshold parameter. In any case, some properties of the original graph 
are inevitably lost under such transformation. 

To avoid this pitfall, we propose to consider the set of all filtered networks, ordered by the descending 
thresholding weight parameter, in the spirit of persistent homology [15, 16, 17]. 

This set, which we call graph filtration, combines link weights and connectivity structure over all weight 
scales. The graph filtration proceeds on the network Q following these steps: 

• Rank the weights of links from u max to uj m i n : the discrete parameter e t scans the sequence. 

• At each step t of the decreasing edge ranking we consider the thresholded graph G(uJij,€ t ), i.e. the 
subgraph of Q with links of weight larger than e t . 
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Figure la provides a schematic illustration of the rank filtration. This approach preserves the complete 
topological and weight information, allowing us to focus on special mesoscopic structures: weighted net- 
work holes, that relate the network's weight-degree structure to its homological backbone. 
A weighted network hole of weight uj is a loop composed by n nodes io^i^2, ••••,*n-i J where all cyclic 
edges (with z = in) nave weights > uj , while all the other possible edges crossing the loop are 

strictly weaker than uj. We focus on this special class of subgraphs, because formally such weighted holes 
are generators of the first homology group, Hi, of the clique complex of the graph thresholded by weight 
uj (see Materials and Methods). The aim of this paper is to characterize the evolution of these generators 
along the network filtration. As we swipe the network from the largest to the smallest weights, network 
holes appear and potentially close. 

By unearthing their properties, we obtain the main contribution of this paper: the statistical features 
of weighted network holes yield a classification of real- world networks in two classes, depending on the 
compatibility or lack thereof with null models generated by graph randomisations. Furthermore, this 
classification is defined by mesoscopic homological structures that cannot be reconduced to local proper- 
ties alone. 

The method used for the classification itself, which we call weighted clique rank homology, is the second 
novel main contribution of this paper. It allows to recover complete and accurate long-range information 
from noisy redundant network data, by building on persistent homology [16], a recent theory developed 
in computational topology [17], which we extend to the case of networks. 

Each weighted hole g is characterized by three quantities: its birth index j3 g , its persistence p g and its 
length X g . After ranking links in a descending order according to their weights, the birth index of a hole 
is the rank t of its weight uj. As we proceed adding links to the filtration in ranking order, it is possible 
that a link with rank t' > t will appear and cross the hole. We call this closure of the weighted hole, 
or death S g . The persistence p g is the interval between the birth and death of g, p g = S g — j3 g = t' — t. 
Finally, the length A^ is the number of links composing g. 

Similarly to stratigraphy, each step of the filtration is a topological stratum of the network, where the 
edge weight rank plays the role of depth. Intuitively, g can then be thought as an underground cavity, 
hidden in the link- weight fabric of the network, and j3 g , p g and A^ as its maximal depth, vertical size and 
girth respectively. 



2 Results 

2.1 Homological network classes 

We applied this analysis to various social, infrastructural and biological networks (see SI for a detailed 
list). In order to compare datasets, indices are normalized by the corresponding filtration length (maximal 
rank) T, so that all j3 g , S g , and thus p g , vary in the unit interval. In addition, we compared each dataset 
with two randomized versions, obtained by weight reshuffling and edge-swapping respectively. While both 
randomisations preserve the weight and degree sequences, the first one redistributes the edge weights 
and is meant to destroy weight correlations, while preserving the linking patterns and thus the degree 
assortativity. The second instead randomizes the network through double-edge swaps, destroying both 
weight and degree correlations [22]. We stress that, as the degree and weight sequences are preserved in 
the randomisations, they cannot account for the differences in the observed homology. 
The statistical distributions obtained for the {fi g }, {p g } and {A^} for Hi cycles highlight a natural division 
of the analysed networks in two broad classes (Fig. 2): 

Class I networks: cycle distributions are markedly different from the randomized versions (cycles dis- 
play shorter persistence times, earlier and broader birth distributions and very short lengths as 
compared to their randomized versions); 
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Class II networks: cycle distributions are very close to their random versions ( late appearance, short 
persistences, long cycles). 



The short cycles of Class I networks nest hierarchically and appear and die over all scales while those in 
the randomized counterparts are born uniformly along the filtration but are more persistent, producing 
largely hollow network instances. The implications are twofold. Since cycles represent weaker connectivity 
regions, this results in class I networks being more solid than the randomized versions, while class II 
networks resemble more closely the randomized instances. Second, since the cycle abundance ratio 
between real and random instances is the same in the two groups, the differences between class I and II 
does not depend on cycle abundance, but rather on their properties. 

This can be seen easily by compressing the whole information within two scalar metrics which do not 
depend on the number of generators in a given network filtration. We define the network hollowness hi 
and the chain-length normalized hollowness hi as: 

h k = ^Y^r (2.1) 
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where is the set of generators of the k-th homological group and N 9k = dimHk their number. 
The first is a measure of the average persistence, while the second weights generators according to both 
their length and persistence. Table [l] reports the values for hi and hi. Class I networks have lower 
hollowness values as compared to their randomized versions, while class II ones show comparable values. 
Interestingly, the hollowness values for the H2 generators mostly vanish for the randomized instances 
(Table [lj, as opposed to the case of real networks. It appears that, while persistent one- dimensional 
cycles are more easily generated in the randomized instances, higher forms of network coordinations, e.g. 
H2 generators (akin to two-dimensional surfaces bounding three-dimensional voids), do not only display 
different properties in comparison to the real network, but are instead wiped away. These findings hint 
therefore to the presence of higher order coordination mechanisms in real world networks. 



Naturally, the two network classes do not represent a binary taxonomy and should be considered as 
two extremes of a range over which networks distribute. For example, we find networks that interpolate 
between these classes, e.g. the online messages network has short persistence intervals, but also late cycle 
appearances and short length cycles. However, classes do not appear to display uniform behavior for local 
and two-body quantities: degree- and weight- distributions and correlations are mixed within the same 
group and do not provide a direct answer for the nature of the two classes. Similarly, a recently proposed 
measure of structural organisation, integrativeness [23], which measures the neighborhood overlap around 
strong links, does not provide insights to explain class I, since within the latter one finds both integrative 
and dispersive networks. 

Finally, the classes do not show a consistent pattern in assortativity: for example, class I includes the gene 
network (assortative) and the airport networks (disassortative) , while class II includes the assortative 
co-authorship networks and the disassortative Twitter data. Therefore, assortativity cannot be the 
discriminating factor between classes. 



2.2 Higher order organization 

Because homology is essentially a non-local property, it was expectable that the local measures just 
mentioned would not be able to explain the observed homological patterns. Network homology can be 
seen in fact as the weighted complement to the perturbative dK-sevies approach [8]: the latter proceeds 
by successive bottom-up constraints on /c-body correlations, rapidly becoming very cumbersome, while 
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our method returns the complete superposition of the network's degree and weight correlation layers in 
a non-perturbative (top-down) fashion. 

A simple artificial network helps illustrating this point: Random Geometric Graphs (RGG) have been 
recently shown to display long-range many-body correlations [24, 25]. We find also that they have ho- 
mological structures reminding of class I networks (Fig. 2a, b and c) ) and the same relation to their 
randomized versions. Class I networks are the result of high-order coordination in a similar way. This is 
supported also by the presence in real networks and RGGs of higher homology generators, which require 
elaborate coordination patterns in order to appear. While these cycles almost disappear in randomized 
versions of real- world networks, they are present in the case of RGGs. 

For the latter and the airports, this organisation can be thought as the result of the non-local constraint 
imposed by the metric of the underlying space [26]. Although spatial constraints are harder to fathom 
for social and genetic systems, alternative explanations are possible: for example, the homological struc- 
ture of the observed online communication and gene networks can be thought as stemming from group 
interactions among people (e.g. mailing lists, multi-user mails) and biological functions (e.g. pathways ) 
respectively, which provide an underlying non-local mechanism for the emergence of homological patterns. 

Further evidence of this behavior can be found by zooming on specific cycles which convey information 
about underlying constrains hidden in the network weight-link connectivity patterns. For example, the 
cycle structure of the air passenger network detects the expected reduced connectivity over oceans -in 
the form of strong persistent cycles- and the strong backbone of US airport hubs, which is then filled 
by the local (intra-community) links (Fig. lb). Another example can be found in the school children's 
face-to-face contact network. As expected we find the most significant cycles to link together different 
school classes (yellow and pink cycles in Fig. lc). However, we also find that a school class (green nodes), 
despite being both a network community and 3-clique component [28], is characterized by a strong in- 
ternal Hi generator, which might be reflecting peculiar social dynamics coming from different seating 
arrangements or schedules for part of the class [29]. 



2.3 Spectral correlates of homology classes 

At the opposite extreme of local quantities lie the spectral properties of networks. It is very important 
therefore to investigate whether it is possible to highlight peculiar spectral signatures of the two classes. 
Network eigenvalues, especially those of the Laplacian matrix, figure prominently in a number of ap- 
plications, ranging from spectral clustering [30] to the propensity to synchronize of a set of oscillators 
distributed on the nodes [31]. Given a graph G, we denote its adjacency matrix A(G) and its Laplacian 
matrix as L(G) = D — A(G), where dij = Sij J2k a ik- For a symmetric network with N nodes, A(G) has 
a set of real eigenvalues Ai > A2 > . . . \n-i > ^n- The spectral gap AA^ = Ai — A2, and its normalized 
version, Ra = X2-XN ' e ^ ec ^ Ye ^Y measure how far the leading eigenvalue lies in comparison to the bulk 
of the eigenvalue distribution [32] . 

Interestingly, we find that class I networks have significantly larger spectral gaps (p < 0.05 comparing 
the distributions) than class II networks (panel IV in Fig. 2a). Despite being somewhat neglected in the 
complex networks literature, AA^ has been linked to the notion of natural connectivity [33]: it encodes 
spectral information about network redundancy in terms of the number of closed paths and is defined 
as A = log YliLi eXi - Rewriting A = Ai + log -^(1 + ^2^L 2 e Xi ~ Xl ) , it is easy to see that for large 

gaps all the terms in the sum are exponentially suppressed and therefore A is essentially dominated by 
the leading adjacency eigenvalue modulo a size effect, A ~ Ai — log TV. This result is consistent with the 
nested cycle structure that we highlighted in class I. More importantly, we find a difference between the 
two classes in the topological constraints to synchronization processes . For the Laplacian L(G), label 

the set of eigenvalues = Ai < \% < A3 < • • • < \^ and define the Laplacian eigenratio Rl = -rx 1 . 
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Barahona and Pecora [20] showed that a set of dynamical systems, placed on the network's nodes and 
coupled according to the graph adjacency with a global coupling a, has a linearly stable synchronous 
state if 

Rl < P (2.3) 

where /3 is a purely dynamical parameter. This inequality implies that networks displaying very large Rl 
are hard (or impossible) to synchronize. Panel IVb of Fig. 2 shows again a significant difference between 
the two classes: class I networks have much larger eigenratios, making them hardly synchronizable. 
Our results show therefore a deep connection between the homological network structure, the network 
spectral properties and their implications on network dynamics. Indeed, the role of mesoscopic structures 
in the stability and evolution of dynamical systems on networks is gradually emerging, as shown for 
example by recent work based on the concepts of basic symmetric subgraphs and their legacy eigenvalues 
in the global network spectrum [21], and is indeed being shaped by algebraic methods, well suited to 
capture the geometric information hidden within the network fabric. 

3 Conclusions 

Hitherto, the homological structure of weighted networks could not be systematically studied. Our 
method, grounded in computational topology, allows to probe multiple layers of organized structure. It 
highlighted two classes of network distinguished by their homological features, which we interpreted as 
caused by differences in the higher order networks organisations that are not captured by (quasi) local 
approaches. 

Among the many possible applications, two very relevant ones for social and infrastructural networks are 
the study of the weighted rich club's geometry beyond the aggregate measure [22, 34], and the generalisa- 
tion of network embedding models to include homological information [35]. Furthermore, the two classes 
displayed also a marked difference in their spectral gap distributions and in particular in the values of 
the algebraic connectivity, implying that the different homological structures are correlated with different 
synchronizability thresholds. 

This work therefore provides a stepping stone towards understanding the coupling between network dy- 
namical processes and the network's homology. 

Finally, the nitration's construction rule is flexible and can be readily adapted to other problems. Sim- 
ilarly to changing goggles, different edge metrics can be used (e.g. betweenness or salience [36]), the 
thresholding method varied (e.g. local thresholding [14]) or the filtration promoted to a filtering on two 
quantities (e.g. edge weight and time in a temporal network) using multi-persistent homology [37]. 



Methods and Materials 
Persistent homology 

The method we use to uncover weighted holes is persistent homology of the weight clique rank filtration. 
In this section we will briefly explain persistent homology and its realization through the weight rank 
clique filtration. 

Persistent homology is a technique from computational algebraic topology that can be viewed as parametrized 
version of simplicial homology [38]. The two definitions needed for simplicial homology are those of sim- 
plicial complex and homology. A simplicial complex is a non empty family X of finite subsets, called faces, 
of a vertex set with the two constraints: 

- a subset of a face in X is a face in JT, 
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- the intersection of any two faces in X is either a face of both or empty. 

We assume that the vertex set is finite and totally ordered. A face of n + 1 vertices is called n— face and 
denoted by [po? • • • ?Pn]- The interpretation of low dimensional faces is intuitive: a 0— face is a vertex, 
a 1— face is a segment, a 2— face is a full triangle, a 3— face is a full tetrahedron. The dimension of a 
simplicial complex is the highest dimension of the faces in the complex. 

Morphism between simplicial complexes are called simplicial maps. A simplicial map is a map between 
simplicial complexes with the property that the image of a vertex is a vertex and the image of a n— face 
is face of dimension < n. 

Simplicial Homology with coefficients in a field is a functor from the category of simplicial complexes to 
the category of vector spaces [38]. Homology of dimension n assigns to each simplicial complex X, the 

vector space H n (X) of n-cycles modulo boundaries and to every simplicial map X — » Y the linear map 
H n {f):H n {X)^H n {Y). 

The construction that leads to the vector space H n is the following. Given a simplicial complex X of 
dimension consider the vector spaces C n on the set of n— faces in X for < n < d. Elements in C n 
are called n— chains. The linear maps sending a n— face to the alternate sum of its (n — 1)— faces. 

d n : C n > C n -\ 

n 

[PO, • • • ,Pn] 5Z( _1 )^ ' * * * ' ' ' 'Pn\- 

i=0 

shares the property d n -\ o d n = 0. 
The subspace ker d n of C n is called the vector space of n— cycles and denoted by Z n . The sub- 
space 7m<9 n+ i of C n , is called the vector space of n— boundaries and denoted by B n . Note that from 
d n -\ o d n = it follows that B n C Z n for all n. 

The n— th simplicial homology group of X, with coefficients in is the vector space H n := Z n /B n . 

Persistent homology is the homology of a filtration, i.e. an increasing sequence of simplicial complexes 

Xq C X\ C . . . C X n = X, 
as opposed to that of a single simplicial complex. 

It assigns to a filtration the homology groups of the simplicial complexes H n (X v ) and the linear maps 
i v ^ w : H n (X v ) — » H n (X w ) induced in homology by the inclusions X v ^ X w for all v < w. Note that the 
linear maps i ViV +i are not always injective, meaning that some homological features can disappear along 
the filtration. These features are encoded by the persistent homology generators: an element g G H n (X v ) 
such that there is no h G H n (X w ) for w < v with the property that i w ^ v - w h = g. Two indices completely 
determine a generator g G H n (X), namely its birth, f3 g and its death S g . The index j3 g traces the first 
index such that g is in the filtration and S g is the index of the simplicial complex in which the cycle be- 
comes a boundary (i.e. disappears homologically ) . The persistence (lifetime) of a generator is measured 
by Pg := fig ~ Pg- The length of a cycle, that is the number of faces composing it, is denoted by X g . 
For each homology group, the information about the filtration is collected in a barcode: the set of intervals 
[f3 g ; S g ] for all generators g G H ni which constitutes a handy complete invariant of H n [16]. An alternative 
way to represent the persistent homology of a filtration is through persistence diagrams [16, 39], which we 
use extensively in the SI. A persistence diagram is a set of points in the plane counted with multiplicity. 
It can be recovered from the barcode considering the points (f3 g ,5 g ) G M 2 with multiplicity given by the 
number of generators with the same persistence interval. In the SI, the reader can find Hi persistent 
diagrams of the real world datasets examined for the classification, together with the explicit comparison 
to the results for their relevant randomized versions. 
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Filtrations 



In classical applications, the filtration is obtained from a point cloud using the Rips-Vietoris complex and 
persistent homology used to uncover robust topological features of the point cloud. We instead use the 
clique weight rank filtration to uncover properties deriving from the topology and weighted structure of 
weighted networks. 

Recalling that an n— clique is a complete subgraph onn + 1 vertices, the clique complex is a simplicial 
complex built from the cliques of a graph. Namely there is a n— face in the simplicial complex for every 
(n + 1)— clique in the graph. The compatibility relations are satisfied because subsets of cliques and 
intersection of cliques are cliques themselves. 

The Weight Rank Clique filtration on a weighted network £1 combines the clique complex construction 
with a thresholding on weights following three main steps. 

• Rank the weights of links from oj max to 0Jrni n : the discrete parameter e t indexes the sequence. 

• At each step t of the decreasing edge ranking we consider the thresholded graph G(uJij, e £ ), i.e. the 
subgraph of Q with links of weight larger than e t . 

• For each graph G(uJij, e t ) we build the clique complex K(G, e t ). 

The clique complexes are nested along the growth of t and determine the weight rank clique filtration. 
Note that this construction is in fact the clique complex of each element in the graph filtration. 
In particular, persistent one dimensional cycles in the weight rank clique filtration represent weighted 
loops with much weaker internal links. 

There is a conceptual difference in interpreting Hi persistent homology of data with the Rips-Vietoris 
filtration and Hi persistent homology of weighted networks with the weight rank clique filtration. While 
in the first case persistent generators are relevant and considered features of the data, short cycles are more 
interesting for networks. This is because random networks, or randomisations of real networks, display 
one dimensional persistent generators at all scales, while short lived generators testify the presence of 
local organisation properties on different scales. 

Acknowledgments 

The authors acknowledge M. Rasetti for stimulating discussions. 



7 




Figure 1. Weight rank clique filtration and homology of networks, (a) The weight rank 
filtration proceeds from the bottom up. Weighted holes (colored) and cliques (gray) appear as links are 
added. Weighted holes can branch into smaller holes, which have then independent evolution, persisting 
or dying along the filtration as links close them by 3-cliques. The cartoon shows two very 
long-persistence holes (violet and purple) appearing quite early and living until the end, while the 
largest hole (red) branches into three smaller holes, of only one survives to the end of the filtration 
(green), (b) A selection of weighted holes from the US air passenger network (year 2000). The node 
colors represent the best modularity partition of the entire network. The cycles are all long-persistence 
one, chosen to represent different behaviors: for example, the Chicago-Los Angeles-San Jose-Seattle 
cycle spans a large spatial distance, implying weaker connectivity across the cycle and within the region 
encompassed by the cycle, while the cycle going east from New York connects the east coast to three 
large European network and its persistence is due to the reduced connectivity due to the Atlantic 
Ocean, (c) A selection of the strongest cycles in the face-to-face contact network in a primary school 
(see SI for details on dataset). Node colors represent different classes in the school. Cycles are often 
found across communities, since by definition they probe the presence of holes among network regions. 
However, this is not the only information they convey. The cycle contained in a single community 
(green) testify the presence of peculiar contact geometries even within dense community structures. 
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(c) (d) 

Figure 2. Figure 2. Statistical and spectral properties of H\ generators. Box plots of the 
distributions of persistences {p g } (panel a)), births {f3 g } (panel b)) and lengths {A^} (panel c)) for the 
Id cycles (Hi generators) of real networks (black), reshuffled (white) and randomized (gray). The gray 
and green shaded areas identify the two network classes described in the main text: class I is 
significantly different from the random expectations, with shorter, less persistent cycles that appear 
across the entire filtration; class II networks are not significantly different from the random versions, 
with long cycles and late birth times in the filtration. The characteristics of class I networks imply a 
stratification of cycles that betrays the presence of large, non-local organisation in the network 
structure, which is not present in class II networks. For comparison, an example of RGG network (600 
nodes in the unitary disk, linking distance 0.01), known to have higher order degree correlations, had 
edge weights set according to Uij oc (kikj) 6 , with 6 = 1 (linearly correlated weight RGG) and = 
(random weight RGG). In both cases, the distributions of cycles' properties resemble closely those of 
class I networks. Panel d) finally reports the distribution of adjacency spectral gaps AA^ and Ra (left 
plot) and the Laplacian eigenratio Rl (right plot). All the quantities show significant (p < 0.05) 
differences between the two classes, implying that the homological structure affect the dynamical 
properties of networks, e.g. the synchronizability threshold. 
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Dataset (class) 


hi 


hi 


hf h 




h rnd 


h rnd 


h 2 


h2 


Genes(I) 


0.515 


0.003 


0.020 ± 0.001 


0.0007 ± 0.00001 


0.0151 ± 0.0004 


0.00023 ± 0.00005 


0.35 


0.006 


Online forums(I) 


0.175 


0.001 


0.355 ± 0.005 


0.007 ± 0.001 


0.325 ± 0.005 


0.007 ± 0.001 


0.02 


0.0003 


US Air 2000(1) 


0.160 


0.001 


0.405 ± 0.005 


0.0065 ± 0.0007 


0.358 ± 0.006 


0.0060 ± 0.0005 


0.02 


0.0003 


US Air 2002(1) 


0.186 


0.0008 


0.39 ± 0.01 


0.0037 ± 0.0003 


0.34 ± 0.01 


0.0034 ± 0.0003 


0.23 


0.002 


US Air 2006 (I) 


0.167 


0.0005 


0.398 ± 0.005 


0.0036 ± 0.0005 


0.348 ± 0.008 


0.0032 ± 0.0003 


0.165 


0.001 


US Air 20011(1) 


0.181 


0.0006 


0.41 ± 0.01 


0.0034 ± 0.0002 


0.35 ± 0.01 


0.0033 ± 0.0003 


0.076 


0.0007 


Online messages(I) 


0.21 


0.0014 


0.190 ± 0.002 


0.0017 ± 0.0001 


0.185 ± 0.002 


0.0015 ± 0.0001 


0.02 


0.0003 


School day 1 (II) 


0.088 


0.0034 


0.113 ± 0.002 


0.007 ± 0.001 


0.093 ± 0.002 


0.006 ± 0.001 


0.015 


0.0012 


School day 2 (II) 


0.090 


0.0033 


0.115 ± 0.002 


0.0065 ± 0.0005 


0.098 ± 0.003 


0.0089 ± 0.0008 


0.01412 


0.00095 


C. elegans (II) 


0.0784 


0.002 


0.0745 ± 0.0017 


0.001 ± 0.0001 


0.0896 ± 0.0023 


0.0041 ± 0.0005 


0.058 


0.002 


Twitter (II) 


0.03 


0.0001 


0.030 ± 0.001 


0.0002 ± 0.0001 


0.029 ± 0.001 


0.0002 ± 0.0001 


0.01 


0.0001 


Hep-th (II) 


0.08 


0.0002 


0.075 ± 0.001 


0.0002 ± 0.0001 


0.0508 ± 0.0003 


0.0002 ± 0.0001 






Cond-mat (II) 


0.26 


0.0004 


0.20 ± 0.003 


0.0002 ± 0.0001 


0.180 ± 0.002 


0.0005 ± 0.0001 






Lin. RGG 


0.227 


0.003 


0.368 ± 0.005 


0.006 ± 0.001 


0.355 ± 0.002 


0.012 ± 0.001 


0.28 


0.006 


Ran. RGG 


0.3 


0.0041 


0.299 ± 0.005 


0.0045 ± 0.0002 


0.649 ± 0.40 


0.015 ± 0.001 


0.115 


0.003 



Table 1. Summary of hollowness values. For each dataset, we report the values of the hollowness 
hi and cycle-length normalized hollowness hi for Hi cycles for real networks and their randomisations 
(sh and rnd). Most networks (class I in particular) show lower values than for their randomized 
versions. We also report the values of the hollowness h^ and cycle-length normalized hollowness h^ for 
H2 cycles for real networks. The values for the randomized networks are not reported as -strikingly- 
the randomisations do not display any higher homology while almost all real networks display positive 
values of the H 2 hollowness. 
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Supplementary Information for "Topological strata of weighted 
complex networks" 

The Supplementary Information is organized in four sections. Section I contains definitions and refer- 
ences concerning persistent homology, our main tool. In section II some constructions for filtrations are 
presented, in particular the weight rank clique filtration is introduced. In section III we describe the 
datasets tested for our main result, the classification of networks based on persistent Hi generators. In 
section IV the reader can find more detailes on the classification, and the plots supporting the result. 

Persistent homology 

This section is devoted to the mathematical framework supporting persistent homology [1] [2] [3]. Persis- 
tent homology can be viewed as parametrized version of simplicial homology, that requires the definitions 
of simplicial complex and homology, for detailed information we refer to [4]. 

Definition 3.1. A simplicial complex is a non empty family X of finite subsets, called faces, of a vertex 
set with the two constraints: 

- a subset of a face in X is a face in X, 

- the intersection of any two faces in X is a face of both. 

We assume that the vertex set is finite and totally ordered. A face of n + 1 vertices is called n— face 
and denoted by [po? • • • ,Pn]- A 0— face is a vertex, a 1— face is a segment, a 2— face is a full triangle, a 
3— face is a full tetrahedron. The dimension of a simplicial complex is the highest dimension of the faces 
in the complex. 

Example 3.2. The clique complex is a simplicial complex constructed from a graph. There is a n—face 
in the simplicial complex for every (n+ 1) — clique in the graph, i.e a complete subgraph onn + 1 vertices. 
The compatibility relations are satisfied because subsets of cliques and intersection of cliques are cliques 
themselves. 

Morphism between simplicial complexes are called simplicial maps. 

Definition 3.3. A simplicial map is a map between simplicial complexes with the property that the image 
of a vertex is a vertex and the image of a n—face is face of dimension < n. 

Fixed a field fc, in the following by vector space we intend a k— vector space and k[t] is the polynomial 
ring in one variable with coefficients in k. Given a simplicial complex X of dimension d, consider the 
vector spaces C n on the set of n— faces in X for < n < d. Elements in C n are called n— chains. The 
linear maps sending a n—face to the alternate sum of it's (n — 1)— faces. 

C n -i 

n 

shares the property <9 n _i o d n = 0. 

The subspace ker d n of C n is called the vector space of n— cycles and denoted by Z n . The subspace 
Im <9 n +i of C n , is called the vector space of n— boundaries and denoted by B n . Note that from <9 n _i od n = 
it follows that B n C Z n for all n. 



d n : C n 
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Definition 3.4. The n—th simplicial homology group of X , with coefficients in k, is the vector space 
H n := Z n j B n . 

The rank of H n is called the n-th Betti number of X. 

The first Betti numbers of X have an easy intuitive meaning: the 0-th Betti number is the number of 
connected components of X, the first Betti number is the number of two dimensional (poligonal) holes, 
the third Betti number is the number of three dimensional holes (convex polyhedron). 

It is fundamental to note that homology is a functor, this implies the following proposition. 

Proposition 3.5. Let X and Y be two simplicial complexes, a simplicial map f : X — > Y determines a 
linear map between the homology groups Hi(f) : Hi(X) — ^ Hi(Y) for all i. 

The starting point in persistent homology is a filtration. As in [2], we call a simplicial complex X 
filtered if we are given a family of subspaces {X v } parametrized by N, such that X v C X w whenever 
v < w. The family {X v } is called a filtration. There are many ways to construct a filtration from a point 
cloud or a network, some relevant ones are explained in section II. 

Definition 3.6. The persistent homology module of a filtration is given by the homology groups of the 
simplicial complexes H n (X v ) and the linear maps i VjW : H n (X v ) — >• H n (X w ) induced in homology by the 
inclusions X v ^ X w for all v < w. 

Following [2], this system is called a module because the vector space H n = ® v H n (X v ) can actually be 
endowed with a k[t]— module structure, defining t-m := i v , v +i(m) for m G H n (X v ). Note that the linear 
maps i VyV +i are not always injective. A persistent homology generator is a generator of H n according 
to the k[t\— structure, i.e an element g G H n (X v ) such that there is no h G H n (X w ) for w < v with the 
property that t v ~ w h = g. By the structure theorem on modules over PID, k[t]— modules are completely 
determined by the degree of each generator g (birth of the generator f3 g ) and the degree in which the 
generator is annihilated by the module action (death of the generator S g ). The persistence (lifetime) of a 
generator is measured by p g := S g — j3 g . The length of a cycle, number of faces composing it, is denoted 
by \ g . 

The barcode of a filtration is the set of intervals \f) g \ S g ] for all generators g G H ni this is a handy complete 
invariant of H ni [2]. By persistent topological features we intend generators of H n such that the interval 
[f3 g ; Sg] is large with respect to the filtration length. 

An alternative way to represent persistent homology modules is the persistence diagram [3], [5]. A 
persistence diagram is a set of points in the plane counted with multiplicity, it can be recovered from the 
barcode considering the points {f3 g ,S g ) G M 2 with multiplicity given by the number of generators with 
the same persistence interval. 

Persistent homology modules can be computed using the libraries javaPlex (Java) or Dionysus (C++), 
which are both available from the Stanford's CompTop group website (http : //comptop . Stanford . edu/ ), 
and presented using the barcode or the persistence diagram. We developed a Python module to wrap 
the javaPlex library, consisting of a number of scripts able to preprocess complex networks and store the 
resulting homological information in a manageable form. 

Filtrations 

In this section we will go through some basic constructions that generate a filtration starting from a point 
cloud or a complex network. 

The most popular filtration for data analysis is the Rips-Vietoris filtration [2]. The Rips-Vietoris 
complex is a simplicial complex associated to a set of points in a metric space in the following way: every 
point p is the center of a radius e ball D(p, e) and n + 1 points {po, . . . ,p n } determine a n— face in the 
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Rips-Vietoris complex if the corresponding radius e balls intersect two by two, i.e D(pi, e) D D(pj,e) ^ 
for all i ^ j G {0 . . . n}. Clearly the Rips-Vietoris complex depends on the parameter e and if e\ < e 2 
the complex with e\ radius balls is contained in the complex with e 2 radius balls. To the growth of e we 
obtain an increasing sequence of simplicial complexes, a filtration, the Rips-Vietoris filtration. In this 
context persistent topological features of the filtration are considered as features of the point cloud. 

For unweighted networks, the Clique filtration is used in [6] to analyse the difference between the bar- 
codes of random networks, networks with exponential connectivity distribution and scale- free networks. 
The k— skeleton Xk of a simplicial complex X is the subcomplex of X containing all the faces of dimension 
smaller or equal to k. Consider a complex network and the corresponding clique complex X, the clique 
filtration is obtained by filtering the clique complex according to the dimension of the skeleton: 

x c x 1 c x 2 c . . . C X. 

Note that persistent features of the Clique filtration are generators of the homology groups of the clique 
complex. These generators can be directly calculated from the clique complex of the graph, thus the 
filtration gives no extra information. This is not the case for the following filtration we have introduced 
for weighted networks in which persistent features cannot be determined from a single simplicial complex 
in the family but instead reveal the intricate multiscale relation between weights and links in a weighted 
indirect network. 

The Weight Rank Clique filtration on a weighted network Q combines the clique complex construction 
with a thresholding on weights. The fist step is to rank the weights of links from uj max to oJmin- the 
discrete parameter e t scans the sequence. At each step t of the decreasing edge ranking we consider the 
thresholded graph G(cojj,et), i.e. the subgraph of Q with links of weight larger than e t . For each graph 
G(uij,e t ) we build the clique complex if (G, et). The clique complexes are nested to the growth of t and 
determine the weight rank clique filtration. Persistent one dimensional cycles represent weighted loops 
with much weaker internal links. 

Datasets 

The dataset analysed in this paper cover a broad range of fields, spanning social, infrastructural and 
biological networks. In detail they are: 

US Air passenger networks The networks refer to the years 2000, 2002, 2006 and 2011. The years 
were chosen to provide snapshots of the air traffic situation at 4-5 years intervals, plus one extra 
(year 2000) just before the events of 9/11 which significantly affected the air transportation industry. 
The data used are publicly available from the website of the Bureau of Transportation Statistics 
(http : //www . transtat s . bt s . gov/ ) . Individual flights between airports were aggregated on routes 



as defined by origin and destination cities. The weight reported is the yearly aggregated passenger 
traffic. 



C. Elegans The network is available at http : //cdg . Columbia . edu/cdg/datasets and reports a weighted, 



directed representation of the C. Elegans's neuronal network [7]. The network was symmetrized 

Jij and cjji, ulf 



by summing the weights present on edges between the same nodes (given ojij and ujji, ^t^ 77 "" 



symm 



Online Messages and Forums The online messages network consists of messages in a student online 
community at University of California [8]. The online forum network refers to the same online 
community, but focuses on the activity of users in public forums, rather than on private messages 



[9]. Both networks are publicly available online at Tore Opsahl's website (http : //toreopsahl . 
|com/datasets/| ). 
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Gene network The gene interaction network used in the paper is a sampling of the complete human 
genome dataset available from the University of Florida Sparse Matrix Collection. Each node is 
an individual gene, while the edges correlates the expression level of a gene with that of the genes 
(using a NIR score [10]). The node set of the analysed network was obtained by randomly choosing 
an origin node, then adding its neighborhood to the node set; the neighborhoods of the newly added 
nodes were then added to the node set recursively until a given number of nodes was obtained (in 
the case used the target number of nodes was N = 1300). Then all the edges present in the original 
network between the nodes in the node set were added, effectively taking a connected subgraph 
of the original network. To reduce the computational complexity due to the large density of the 
graph, the weighted clique filtration was stopped at an edge weight of 0.09 (similarly to the choice 
made in [11]). 

Twitter The dataset consists of a network of mentions and retweet between Twitter users and is available 
online on the Gephi dataset page (http://wiki.gephi.org/index.php/Datasets). Weights are 
proportional to the number of interactions between a pair of users. 

School face-to-face contact network The dataset contains two days of recorded face-to-face interac- 
tions in a primary school. Each node represents a child, with the edge weight between two nodes 
being proportional to the amount of time the two children spent face to face. We analysed the two 
days separately, yielding two networks. The dataset has been collected by the Sociopattern project 
( |http : //www . soci opatterns . org/ ) and analysed in [12]. 

Co-authorship networks The networks analysed are the weighted co-authorship networks of the Con- 
densed Matter E-print Archive between 1995 and 1999 (cond-mat) and the High-Energy Theory 
E-print Archive between 1995 and 1999 (hep-th) [13]. 

Finally, for comparison we use Random Geometric Graphs (RGG) [14, 15], which are simple models 
of spatial networks: a RGG is generated by sprinkling N of nodes randomly on a metric space that acts 
as a substrate (usually a disk of unitary radius or a square with identified edges), and then linking nodes 
that are closer than a given linking distance d. 

Finally, the networks analysed in this article are undirected and weighted, because the weighted clique 
filtration finds a natural application in such case. However, schemes for directed networks can be easily 
devised and tailored to specific case studies, e.g. one could adopt the definition used in the directed clique 
percolation method [16] in order to associate network structures to simplices. 

Results for weight rank clique filtration 

We recall that given a network G on N nodes, we consider the weight clique rank filtration on G. Let T 
be the length of the filtration, {^} the set of generators of the i— th persistence homology module of the 
filtration and N g . the cardinality of {gi}. For every generator the index p g . is it's persistence interval, 
the index X g . is it's length and /3 9i is it's birth index. For brevity, Hi generators will be denoted by g 
rather than g\. 

There is a conceptual difference in interpreting H\ persistent homology of data with the Rips-Vietoris 
filtration and H\ persistent homology of weighted networks with the weight rank clique filtration. While 
in the first case persistent generators are relevant and considered features of the data, short cycles are 
more interesting for networks. This is because random networks, or randomisations of real networks, 
display one dimensional persistent genrators at all scales, while short lived generators testify the presence 
of local organisation properties on different scales. 

As stated in the main text, the complex networks we considered fall in two main groups. 

Networks in group I display clear departures from the null counterparts, while class II networks show 
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homological features that are much closer to the randomized versions. We collected the complete infor- 
mation about the indices p gi X g and j3 g for persistent Hi generators within a series of tableaux (Figures 
S.3 to S.17). In every figure, panel a) represents the distribution of persistence p g , panel b) the distri- 
bution of length and panel c) the distribution of birth index j3 g . These quantities are studied for the 
homology generators in the real world network(red circles), after weight reshuffling of the network(blue 
squares) and in the network randomisation(green triangles). Panel d) is the persistence diagram of the 
network under study, panel e) is the persistence diagram of its weight reshuffled null model and panel /) 
is the persistence diagram of the random null model. 

From the perspective of persistence diagrams, class I presents a rich structure of nested cycles covering all 
scales, as opposed to the weight reshuffled null model and random null model where generators are born 
uniformly along the filtration and tend to be very persistent, producing largely hollow network instances. 
The degree and weight sequences are preserved in the randomisations and therefore cannot account for 
the differences in the homology. Another possibility to explain the different behavior of the two classes 
could be the presence of degree-degree or weight-degree correlations in class I. However, networks in 
the two classes do not show consistent patterns of assort at ivity: for example, class I includes the gene 
network (assortative) and the airport networks (disassortative) , while class II includes the assortative 
co-authorship networks and the disassortative Twitter data. Also weight-degree correlations do not ap- 
pear to be decisive: for example, the RGGs generated with random edge weights did not show significant 
differences from those generated with edge weights correlated positively to the degrees of the end nodes 



(see Figs. S.16 and S.17). 
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Figure S.3. Summary of H\ persistent homology results for the human gene interaction 
network 2 (Class I). 



17 






<2QJ 









(d) 



(e) 



(f) 



Figure S.4. Summary of Hi persistent homology results for online forum network of [9] 
(Class I). 
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Figure S.6. Summary of Hi persistent homology results for the US airways passenger 
network for 2002 (Class I). 




Figure S.7. Summary of Hi persistent homology results for the US airways passenger 
network for 2006 (Class I). 
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Figure S.8. Summary of Hi persistent homology results for the US airways passenger 
network for 2011 (Class I). 
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Figure S.9. Summary of Hi persistent homology results for the online messages network 
of [8] (Class I). 
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Figure S.10. Summary of H\ persistent homology results for the day 1 face-to-face contact 
duration network of children of [12] (Class II). 
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Figure S.ll. Summary of Hi persistent homology results for the day 2 face-to-face contact 
duration network of children of [12] (Class II) 
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Figure S.12. Summary of Hi persistent homology results for the neural network of the C. 
elegans (Class II). 
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Figure S.13. Summary of Hi persistent homology results for a network of mentions and 
retweets of a part of the Twitter network (Class II). 
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Figure S.14. Summary of Hi persistent homology results for the Hep-th arxiv (Class 
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Figure S.15. Summary of Hi persistent homology results for the cond-mat (Class II) 
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Figure S.16. Summary of Hi persistent homology results for the Random Geometric 
Graph model with linear weight-degree correlations (Class I). The graph has N = 600 nodes 
and a linking distance d = 0.01. The weight of a link between nodes i and j was set according to 



(kikj) X, where = 1 and X is a uniform random variable in (0, 1). 
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Figure S.17. Summary of Hi persistent homology results for the Random Geometric 
Graph model with linear weight-degree correlations (Class I). The graph has N = 600 nodes 
and a linking distance d = 0.01. The weight of a link between nodes i and j was set with random 
uniform weights . 
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